-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate maxPolygonToCellsSize implementation #805
Conversation
You know you're saying that on average, calculating the max size takes about 290 nanoseconds. I just can't see anyone saying they don't have 290ns of time to spare, myself. ;) |
The polygonToCells tests are what's failing, btw I think we hardwired some of the |
About to look, but I'll bet the failure has to do with adding the new polyfill mode - test issue, not code issue |
Agree, but it can add up in a hot loop, and there are probably some pathological cases (e.g. polygons with millions of vertexes). Those might be slow for the existing estimator too though. |
Yes, but when you were talking about the current algo being constant-time, it's only that after the bounding box has been determined, which is But also, if we are keeping the |
Hmm... But it should be backwards compatible, right? We already required a mode that was forced to be zero with V4, so the actual function call signature and behavior should not have changed. It should only be things like |
The tests still use the old estimator - it was the way we were checking for invalid modes in the tests (using |
On this, I'm fine with reducing the max size, but the fair comparison would be old algo + old estimator vs new algo + new estimator. The old algo was coupled with a lot of false positives to deal with, right? |
Yes, I did - I'm updating the benchmark in this PR. With the new estimator, the new algo is slower by res 4:
Compare with the previous results, which didn't include either estimator. I think this is probably an acceptable tradeoff - we may not need/want to use the estimator at all once we have streaming, and the old estimator was actually wrong for the new modes, and possibly for some polar areas. But it's not no-cost. |
@@ -48,7 +48,8 @@ typedef enum { | |||
CONTAINMENT_CENTER = 0, ///< Cell center is contained in the shape | |||
CONTAINMENT_FULL = 1, ///< Cell is fully contained in the shape | |||
CONTAINMENT_OVERLAPPING = 2, ///< Cell overlaps the shape at any point | |||
CONTAINMENT_INVALID = 3 ///< This mode is invalid and should not be used | |||
CONTAINMENT_OVERLAPPING_BBOX = 3, ///< Cell bounding box overlaps shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this purely an internal detail, or should be exposed to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think users could potentially benefit from this - it's a faster (2-3x) version of CONTAINMENT_OVERLAPPING
that may contain false positives, but no false negatives. Not appropriate for most use cases, but some users might find it useful.
return iter.error; | ||
} | ||
|
||
static int MAX_SIZE_CELL_THRESHOLD = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @dfellis - we could potentially allow the caller to configure this value, by supplying it in the flags. The caller could give either a specific value, or an "order of magitude" value (exponent of 10) that could make this more accurate.
Co-authored-by: David Ellis <[email protected]>
Adds an alternate implementation of
maxPolygonToCellsSize
, based on a coarse run ofpolygonToCellsExperimental
. This approach is slower than the currentmaxPolygonToCellsSize
, but should accommodate the new modes and fix cases where we were previously underestimating the new polyfill algo around the poles. It generally provides a much better ratio of memory allocation to memory use.Algorithm Overview
polygonToCellsCompact
iterator, without taking the first step.MAX_SIZE_CELL_THRESHOLD
cells (threshold
is currently set to 100, which is a little arbitrary)0
.CONTAINMENT_OVERLAPPING_BBOX
mode (likeCONTAINMENT_OVERLAPPING
, but checks the cell bounding box rather than the boundary, for about 2-3x perf improvement). For each output cell, add the count of its children at the target res to the output estimate.Results
MAX_SIZE_CELL_THRESHOLD
value, at the cost of precision. Benchmarks:If it seems useful, I can try different values for
MAX_SIZE_CELL_THRESHOLD
, which I assume will allow us to trade some of the ratio advantage for faster perf.TODO
polygonToCellsExperimental
tests